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SYSTEM AND METHOD OF MANAGING AND MONITORING CLUSTER 

AND GRID RESOURCES 

BACKGROUND OF THE INVENTION 

1 . Field of the Invention 

[0001] The present invention relates to a resource management system and more 
specifically to a system and method of managing and monitoring cluster resources. 

2. Introduction 

[0002] Managers of clusters desire maximum return on investment often meaning high 
system utilization and the ability to deliver various qualities of service to various users 
and groups. A cluster is typically defined as a parallel computer that is constructed of 
commodity components and runs as its system software commodity software. A cluster 
contains nodes each containing one or more processors, memory that is shared by all of 
the processors in the respective node and additional peripheral devices such as storage 
disks that are connected by a network that allows data to move between nodes. 
[0003] The managers of such clusters need to understand how the available resources 
are being delivered to the various users over time and need the ability to have the 
administrators tune 'cycle delivery 1 to satisfy the current site mission objectives. 
[0004] How well a scheduler succeeds can only be determined if various metrics are 
established and a means to measure these metrics are available. While statistics are 
important, their value is limited unless optimal statistical values are also known for the 
current environment including workload, resources, and policies. If one could determine 
that a site's typical workload obtained an average queue time of 3 hours on a particular 
system, this would be a good statistic. However, if one knew that through proper tuning, 
the system could deliver an average queue time of 1.2 hours with minimal negative side 
effects, this would be valuable knowledge. 
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(0005] The present invention was developed to address these issues. At its core, the 
invention provides a number of software tools designed to truly manage cluster resources 
and provide meaningful information about what is actually happening on the system. 
The inventions were created to satisfy real-world needs of a batch system administrator 
as he or she tries to balance the needs of users, staff, and managers. 

DETAILED DESCRIPTION OF THE INVENTION 

[0006] The details of the present invention will be understood with reference to the 
associated documents attached as Appendix A hereto and further includes a CD 
according to 37 C.F.R. 1.54(e) and 1.96. There are two copies of the CD (Copy 1 and 
Copy 2). Each copy contains the same identical set of documents. The enclosed CD 
Listing of Documents will set forth the documents on the CD. The set of documents is 
all the source code to the MOAB™ workload manager. Each document contained on 
the CDs is incorporated herein by reference into this patent application. 
[0007] Embodiments within the scope of the present invention may also include 
computer-readable media for carrying or having computer-executable instructions or data 
structures stored thereon. Such computer-readable media can be any available media that 
can be accessed by a general purpose or special purpose computer. By way of example, 
and not limitation, such computer-readable media can comprise RAM, ROM, EEPROM, 
CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage 
devices, or any other medium which can be used to carry or store desired program code 
means in the form of computer- executable instructions or data structures. When 
information is transferred or provided over a network or another communications 
connection (either hardwired, wireless, or combination thereof) to a computer, the 
computer properly views the connection as a computer- readable medium. Thus, any 
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such connection is properly termed a computer-readable medium. Combinations of the 
above should also be included within the scope of the computer-readable media. 
[0008] Computer-executable instructions include, for example, instructions and data 
which cause a general purpose computer, special purpose computer, or special purpose 
processing device to perform a certain function or group of functions. Computer- 
executable instructions also include program modules that are executed by computers in 
stand-alone or network environments. Generally, program modules include routines, 
programs, objects, components, and data structures, etc. that perform particular tasks or 
implement particular abstract data types. Computer-executable instructions, associated 
data structures, and program modules represent examples of the program code means 
for executing steps of the methods disclosed herein. The particular sequence of such 
executable instructions or associated data structures represents examples of 
corresponding acts for implementing the functions described in such steps. 
[0009] Those of skill in the art will appreciate that other embodiments of the invention 
may be practiced in network computing environments with many types of computer 
system configurations, including personal computers, hand-held devices, multi-processor 
systems, microprocessor-based or programmable consumer electronics, network PCs, 
minicomputers, mainframe computers, and the like. Embodiments may also be practiced 
in distributed computing environments where tasks are performed by local and remote 
processing devices that are linked (either by hardwired links, wireless links, or by a 
combination thereof) through a communications network. In a distributed computing 
environment, program modules may be located in both local and remote memory storage 
devices. 
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CLAIMS 

1. A method of co-allocating resources within a compute environment, the method 
comprising: 

receiving a first request for a reservation for a first type of resource; 
analyzing constraints and guarantees associated with the first request; 
identifying a first group of resources that meet the first request; 
receiving a second request for a reservation for a second type of resource; 
analyzing constraints and guarantees associated with the second request; 
identifying a second gtoup of resources that meet the second request; and 
generating a co-allocation map between the first group of resources and the second group of 
resources. 

2. The method of claim 1, further comprising reserving resources according to the generated 
co-allocation map. 

3. The method of claim 1 or 2, wherein generating the co-allocation map comprises identifying 
a reduced map of quantities of resources that can simultaneously satisfy the first, request and the 
second request. 

4. The method of claim 3, wherein the co-allocation map comprises all time frames where 
available resources exist that satisfy the first request and the second request. 

5. The method of any one of the preceding claims, wherein possible types of resources 
comprise at least one of: compute resources, disk storage resources, network bandwidth resources, 
memory resources, licensing resources. 

6. The method of claim 1, wherein generating the co-allocation map further comprises 
identifying an intersection of the availability of each of the first type of resource and the second type 
of resource. 

7. The method of claim 6, wherein generating the co-allocation map further comprises 
determining intersecting time frames in which both the first request and the second request may be 
simultaneously satisfied. 
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8. The method of claim 7, further comprising: 

generating a resulting array of events describing the intersecting time frames. 

9. The method of claim 8, wherein the resulting nrray of events comprises at least one of 
resource quantity, resource quality, time frames, quality of information and cost. 

10. The method of any one of the preceding claims, wherein the first request and the second 
request comprise at least one of: a job description, at least one time frame availability, a description 
of minimum resources, a description of resource types and attributes, a reservation duration 
minimum. 

1 1. The method of any one of the preceding claims, wherein identifying the first group of 
resources and the second group of resources further comprises analyzing events associated with the 
first request and the second request and how resource availability changes over time. 

12. The mediod of claim 1 1, wherein the events comprise at least one of job start, job 
completion, state change, boundaries, reservations and policy enforcement limits. 

13. The method of any one of the preceding claims, further comprising reporting at least one of 
the following parameters associated with the identified first and second group of resources: cost, 
quality of information data, resource quantity data, time frame data, and resource quality data. 

14. The method of any one of the preceding claims, further comprising: 
performing again, under constraints identified by the co-allocation map, the step of 

identifying a firs t group of resources that meet the request for the first type of resource. 

15. The method of claim 14, further comprising: 

performing again, under constraints identified by the co-allocation map, the step of 
identifying a second group of resources that meet the request for the second type of resource. 

16. The method of any one of the preceding claims, wherein: 

receiving a request for a reservation for a first type of resource further comprises receiving a 
request for the first type of resource for a first time frame, and wherein the identifying and 
analyzing steps for the first type of resource take into account the first time frame; 
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receiving a request for a reservation for a second type of resource further comprises 
receiving a request for the second type of resource for a second time frame, wherein the identifying 
and analyzing steps for the second type of resource take into account the second rime frame; and 

generating the co-allocation map between the first group of resources and the second group 
of resources further comprises calculating an intersection of the first time frame and the second 
time frame. 

17. The method of any one of the preceding claims, wherein the constraints arc at least one of 
resource matching in terms of type, attribute or quantity. 

18. The method of any one of the preceding claims, wherein the constraints and guarantees 
associated widi the first request and the second request relate to resource-based policies. 

19. The method of any one of claims 1 to 17, wherein the constraints and guarantees associated 
with the first request and the second request relate to time-based policies. 

20. The method of claim 19, wherein the time-based policies limit requestors to a pre- 
determined quantity of resources at any given moment in time. 

21. The method of any one of the preceding claims, wherein receiving a request for a 
reservation for a first type of resource further comprises receiving a request for a reservation for the 
first type of resource having an attribute. 

22. The method of claim 21 , wherein the attribute is at least one of disk storage space, memory, 
license scope, network bandwidth capability, clock speed and central processing power. 

23. The method of any one of the preceding claims, wherein the co-allocation map is computed 
as one of an intersection, a union or a distinct response. 

24. The method of claim 23, further comprising, before reserving compute resources, 
presenting to a requestor of a reservation of the first and second type of resources an analysis of the 
compute resources and a possible reservation. 

25. The mediod of claim 24, wherein the presented analysis relates to a quantity and quality of 
the compute resources in relation to the request for a reservation for resources. 

26. The method of claim 25, further comprising: 

receiving from the requestor of a reservation a revised request for resources based on the 
presented analysis. 

25 



P54135EP 



27. The method of any one of claims 23 to 26, wherein a requestor may select that generating 
the co-allocation map returns an analysis according to at least, one of the interaction, union or 
distinct response. 

28. The method of claim 27, wherein the analysis returned to the requestor, according to at least 
one of the interaction, union or distinct response, corresponds to an analysis of the quantity of 
resources and a degree of fulfillment of the request according to available resources. 

29. The method of claim 28, wherein the analysis returned to the requestor further comprises a 
list of resources that can fulfill the request of the requestor. 

30. The method of claim 28, wherein the analysis returned to the requestor further comprises a 
transaction ID associated with the analysis. 

31. The method of claim 30, further comprising presenting to the requestor an option to submit 
the request with reference to the transaction ID. 

32. The method of any one of the preceding claims, wherein the generated co-allocation map 
represents a set. of resources exclusive to at least one of the first request or the second request. 

33. The method of claim 32, wherein the first request specifies exclusivity. 

34. The method of claim 33, further comprising: 

guaranteeing that the first request will be able to reserve exclusive resources. 

35. The method of claim 32, further comprising generating a co-allocation map between the 
first group of resources and the second group of resources. 

36. A method of co-allocating resources within a compute environment, the method 
comprising: 

receiving a first request for a reservation for a first type of resource; 
analyzing constraints and guarantees associated with the first request; 
identifying a first group of resources that meet the request for the first type of resource; 
receiving a second request for a reservation for a second type of resource; 
analyzing constraints and guarantees associated with the second request; 
identifying a second group of resources that meet the request for the second type of 
resource; and 
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generating a set of resources exclusive to at least one of the first request or the second 

request. 

37. The method of claim 36, further comprising generating a co-allocation map between the 
first group of resources and the second group of resources. 
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[0016] FIG. IB illustrates an access control list which provides access to resources within a 
compute environment; 

[0017] FIG. 2A illustrates a plurality of reservations made for compute resources; 

[0018] FIG. 2B illustrates a plurality of reservations and jobs submitted within those reservations; 

[0019] FIG. 3 illustrates a dynamic access control list; 

[0020] FIG. 4 illustrates a reservation creation window; 

[0021] FIG . 5 the co-allocation process; and 

[0022] FIG. 6 illustrates a method aspect of the invention. 

[0023] 

DETAILED DESCRIPTION OF THE INVENTION 

[0024] Various embodiments of the invention are discussed in detail below. While specific 
implementations are discussed, it should be understood that this is done for illustration purposes 
only. A person skilled in die relevant art will recognize that other components and configurations 
may be used without parting from the spirit and scope of the invention. 

[0025] The "system" embodiment of the invention may comprise a computing device that includes 
the necessary hardware and software components to enable a workload manager or a software 
module performing the steps of the invention. Such a computing device may include such known 
hardware elements as one or more central processors, random access memory (RAM), read-only 
memory (ROM), storage devices such as hard disks, communication means such as a modem or a 
card to enable networking with other computing devices, a bus that provides data transmission 
between various hardware components, a keyboard, a display, an operating system and so forth. 
There is no restriction that the particular system embodiment of the invention have any specific 
hardware components and any known or future developed hardware configurations are 
contemplated as within the scope of the invention when the computing device operates as is 
claimed. 

[0026] The present invention relates to reservations of resources within the context of a compute 
environment. One example of a compute environment is a cluster. The cluster may be, for 
example, a group of computing devices operated by a hosting facility, a hosting center, a virtual 
hosting center, a data center, grid and /or utility-based computing environments. Every reservation 
consists of three major components: a set of resources, a timeframe, and an access control list 
(ACL). Additionally, a reservation may also have a number of optional attributes controlling its 
behavior and interaction with other aspects of scheduling. A reservation's ACL specifics which jobs 
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request for resources based on the returned analysis, if for example, the analysis shows that there are 
no resources available within a day and the user desires to start jobs earlier than that time. 
[0086] Another aspect of the co-allocation of reservations relates to identifying a request with a flag 
related to exclusion. This is an option identified in the reservation flags options 408 in FIG. 4. The 
exclusive flag or parameter may be identified with one or more requests for resources. In this 
regard, the generated map of co-allocated resources represents a ser of resources exclusive to at least 
one of the first request or the second request. With an exclusive parameter attached to the first 
request, the method may further include guaranteeing that the first request will be able to reserve 
exclusive resources. The method may further comprise generating a co-allocation map between the 
first group of resources and the second group of resources. 

[0087] An example will assist in understating the co-allocation exclusion request. With reference to 
FIG. 5, assume a first request Rl requests a first set of resources and R2 requests a second set of 
resources. At least one request is set with the exclusive flag. Setting this flag changes the analysis 
and mapping and the end results when a reservation is committed. The exclusive analysis includes 
analyzing Rl during the co-allocation time frame identified and bounded in the union analysis 532 
and identifying an Rl list 1 and an Rl list 2 bounded by the union time frame 532, and identifying 
an R2 list 1 and an R2 list 2 in the bounded time frame 532. (There may also be time frames for 
mappings 528 and 530 with analysis as well). An analysis is done to generate these lists called a 
multi-request distribute balance tasks that steps through and provides resources to the lists 
according to the most needy request. The result of the analysis is a transaction ID (1) for Rl which 
represents available resources for Rl during the union time frame 532 and a transaction ID (2) for 
R2 representing available resources for R2 for the time frame 532. But the underlying analysis, and 
what is hidden inside these transaction IDs, is a mapping in each case to an. explicit set of resources. 
If the user then commits the co-allocation request, the reservation for this query will reserve two 
different explicit sets of resources guaranteed to not overlap associated with the two transaction 
IDs. 

[0088] With this in mind, an aspect of the exclusion embodiment of the invention relates to a 
method of co-allocating resources within a compute environment. The method comprises receiving 
a first request for a reservation for a first type of resource, analyzing constraints and guarantees 
associated with the first request, identifying a first group of resources that meet the request for the 
first type of resource, receiving a second request for a reservation for a second type of resource, 
analyzing constraints and guarantees associated with the second request, identifying a second group 
of resources that meet the request for the second type of resource and generating a set of resources 
exclusive to at least one of the first request or the second request. 
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